Phoneme dependent frame selection preference
نویسندگان
چکیده
In previous study we proposed algorithms to select representative frames from a segment for phoneme likelihood evaluation. In this paper we show that this frame selection behavior is phoneme dependent. We observe that some phonemes benefit from frame selection while others do not, and that this separation matches the phonetic categories. For those phonemes sensitive to frame selection, we find that selecting frames at some pre-defined positions in the segment enhances the discrimination between phonemes. These phoneme-dependent positions are explicitly retrieved and used in a phoneme classification task. Experimental results on the TIMIT phonetic database show that the frame selection method significantly outperforms decoding by the classical Viterbi decoder.
منابع مشابه
Single frame selection for phoneme classification
Our former study [1] has shown that maximum likelihood (ML) based frame selection, which selects reliable frames from a high resolution along the time axis, helps to improve the discrimination between phonemes. In this paper, we present our recent research on single frame selection for a phoneme classification task. A new single selection, which only selects one frame for one state in an Hidden...
متن کاملRapid unsupervised adaptation using frame independent output probabilities of gender and context independent phoneme models
Business is demanding higher recognition accuracy with no increase in computation time compared to previously adopted baseline speech recognition systems. Accuracy can be improved by adding a gender dependent acoustic model and unsupervised adaptation based on CMLLR (Constrained Maximum Likelihood Linear Regression). CMLLR-based batch-type unsupervised adaptation estimates a single global trans...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملAcoustic Models Based on Non-uniform Segments and Bidirectional Recurrent Neural Networks
In this paper a new framework for acoustic model building is presented. It is based on non-uniform segment models, which are learned and scored with a time bidirectional recurrent neural network. While usually neural networks in speech recognition systems are used to estimate posterior "frame to phoneme" probabilities, they are used here to estimate directly "segment to phoneme" probabilities, ...
متن کاملAcoustic model building based on non-uniform segments and bidirectional recurrent neural networks
In this paper a new framework for acoustic model building is presented. It is based on non-uniform segment models, which are learned and scored with a time bidirectional recurrent neural network. While usually neural networks in speech recognition systems are used to estimate posterior "frame to phoneme" probabilities, they are used here to estimate directly "segment to phoneme" probabilities, ...
متن کامل